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(54) MEDIA CONVERSION METHOD AND MEDIA CONVERSION DEVICE 
(57)Abstract: 

PROBLEM TO BE SOLVED: To solve the problem In a conventional image 
format that it cannot be directly applied when regenerating contents from the 
middle or in a stream encoded at real time. 

SOLUTION: In this method and device, the moof 23 part of an original stream 53 
is converted to moov 71 in the conversion processing'part 56 of a delivery server. 
According to the above means, a stream started from the middle of the original 
contents can be realized on a server side with a slight processing quantity, and 
the regeneration from the middle can be realized on a temninal side without 
changing the prior and existing receiving and regenerating processing of stream. 
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CLAIMS 



[Claim(s)] 

[Claim 1] Header information and the media access information which it was 
subdivided to this header information and has been arranged, The sign which 
consists of media data corresponding to this media access information is 
inputted. Furthermore, input playback starting position information and new 
header information is generated from said header information and the media 
access information applicable to said playback starting position information, this 
- the media conversion approach characterized by generating and outputting a 
new sign from the media access information after the starting position 
corresponding to new header information and said playback starting position 
infomnation, and media data. 

[Claim 2] The media conversion approach according to claim 1 characterized by 
accumulating said header infomnation, considering that the demand of 
generation of a sign is the input of said playback starting position information, 



and generating and outputting a new sign. 

[Claim 3] Two or more signs from which the property which Is subdivided with 
header infomnation and consists of anranged media access infomnation and 
media data differs, The environmental infonnation for carrying out the selection 
judging of the one sign from said two or more signs is inputted. The selection 
information which chooses one media access information and media data based 
on said environmental information is generated. For every subdivided media 
access information and media data using said selection information And media 
data selection is made, one media access information -- new from the media 
access information of the head after the conversion specified as the media 
access information and the media data, and said header information pan of a 
single string obtained by said selection - what -- the media conversion approach 
characterized by outputting the generated header information as one new sign. 
[Claim 4] The media conversion approach according to claim 3 that two or more 
signs from which said property differs are characterized by two or more being 
each the sign from which a bit rate differs. 

[Claim 5] the processing which generates the selection information which 
chooses one media access information and media data based on said 
environmental information - the time of initiation of one media conversion - 1 
time - activation - claim 3 characterized by things, or the media conversion 



approach according to claim 4. 

[Claim 6] The media distribution equipment carry out having the 
transform-processing section which generates a conversion stream from the 
time of day of the neighborhood where it was specified In the middle of the 
original stream accumulated In said are-recording processing section by 
receiving the encoding processing section which encodes an image and outputs 
a original stream, the are-recording processing section which accumulates the 
original stream from said encoding processing section, and the input of playback 
start time, and generating new header information, and the message distribution 
processing section distribute a conversion stream as the description. 



DETAILED DESCRIPTION 

[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] With respect to a video-delivery-through-the-lnternet 
server, especially this invention relates to the art in the case of distributing the 
image of real time through a distribution server, when distributing an image file 
from the middle. 



[0002] 

[Description of the Prior Art] in distributing applicable data for an image or an 
audio (an "audio" is hereafter used in the sense of voice or an audio), and an 
image to a tenminal from a server through a transmission line according to the 
request from a terminal, the system layer which multiplexes synchronization 
information to show each media, i.e., an image, and the playback timing of an 
audio, and image data audio data and synchronization information as one data is 
needed. It is an eye conventionally as a method which specifies these system 
layer and synchronization information. S Ore/eye I C There was a file format 
(following MP4 format) defined by ISO/IEC 14496-1. MP4 format consists of an 
incidental information part 11 called moov like drawing 1 , and encoded media 
data (a part for image data or the audio data division 12) which are called mdat. 
moovll consists of a storing location of header information and each media 
information (following header information 13), and each media, and a playback 
time information (time stump) part (following media access information 14). as 
further shown in drawing 2 . The number of the images included in subsequent 
data, image size, the coding method, the bit rate. etc. are described by header 
information 13. On the other hand, the storing positional information of every 
playback unit (henceforth access unit : AU) of the image (or audio) data in 
mdat12 and the playback time information of each AU are stored in the media 



access information 14. 

[0003] When in the case of a file of MP4 format like drawing 1 a file is distributed 
through a transmission line and the actuation which reproduces an Image Is 
considered in parallel to reception actuation from the file reception middle at the 
received terminal, it is necessary to read all the data of moov1 1 part which is not 
used for the head partial regeneration of a file, and the time delay from file 
reception initiation to playback initiation increases. In order to reduce the time 
delays in such a case, like drawing 3 , contents are subdivided to short-time 
contents and the approach of distributing the media access information and 
media data corresponding to each short-time contents in a file by turns, and 
an-anging, i.e., the approach of distributing and arranging to moov21 of a head 
and two or more moof 23 and 25. is learned. The structure of moov when 
distributing moov to one moov arid one or more moof(s) has become like 
drawing 4 . and when the information 32 about moof exists shows the time moof 
exists henceforth. The structure of moof consists of media access information (a 
data location and time stump) over each media contained in mdat which follows 
the serial number 41 of Relevance moof, and Relevance moof like drawing 5 . 
[0004] 

[Problem(s) to be Solved by the Invention] In the case of the stream encoded on 
real time, the above-mentioned conventional technique could not be directly 



adapted when reproducing from the middle of contents. This invention aims at 
offering the stream conversion approach which makes refreshable the stream 
which is encoded by the playback from the middle of contents, or real time, and 
which continues indefinitely in the terminal only corresponding to the method of 
the conventional technique. 
[0005] 

[Means for Solving the Problem] In order to attain the above-mentioned purpose, 
the moof part of a original stream is changed into moov in a distribution server. 
[0006] 

[Embodiment of the Invention] The 1st example by this invention is shown in 
drawing 6 below. Drawing 6 generates the original stream 53 which once agreed 
the image 51 in the MP4 fonmat by the encoding processing 52. Once 
accumulating by the are recording processing 54. the playback start time 60 is 
received and the accumulated stream 55 is read. By transform processing 56 
The conversion stream 57 started from the time of day of the neighborhood 
where it was specified in the middle of the stream 53 is generated, and 
generated strike **-MU 57 is distributed to a terminal as a distribution stream 59 
by message distribution processing 58. 

[0007] Drawing 7 shows the relation of the original stream 53 and the conversion 
stream 57 in the above-mentioned transform processing. mdat22 by which the 



original stream 53 follows moov21 and it at the head is arranged, and the 
combination of moof23/moof25 [ mdat24 and ]/mdat26, and moof and mdat 
corresponding to it is repeated below (in the following explanation, the group of 
moof corresponding to the media data mdat and them Is expressed using T like 
moof/mdat). When AU applicable to start time is contained in mdat24, a new 
start point is made into the head of mdat24, and new moov70 is generated in 
transform processing 56 from the information on each media (an image and 
audio) indicated by moov21, and the information on mdat24 indicated by moof23. 
Henceforth, mdat of the original stream 53 Is copied, and mdat 24 and 26 is 
outputted as moof71, after, as for moof25, the serial number is changed. In 
addition, a stream is a stream which skipped the part which is unnecessary to 
generation of a stream 57 among streams 53 as for 55, for example, mdat22 
grade. 

[00081 Drawing 8 is the flow chart which showed the detail of transform 
processing mentioned above. In transform processing 56, first, moov21 of a 
original stream is read and the header information described there is read. Next, 
moof23 used as a new stream starting position is searched. In the moov output 
processing 80, new moov70 is outputted using the above-mentioned header 
information and the information on moof used as a stream starting position. After 
continuing at moov70 and outputting mdat24 corresponding to moof23, the 



loop-formation processing 81 wfiicli outputs the combination of moof/mdat of a 
predetermined number is started. In loop-formation processing, the temriination 
judging 82 of whether there is any combination of moof/mdat which continues 
first, and a stream is performed. In stream termination, it moves to processing 83, 
and information, such as data size of the contents cun-ently written in moov70 
and playback time amount, is updated, and it ends processing. On the other 
hand, as for the case of not ending, processing by the side of a loop formation 81 
is performed by the termination judging processing 70. That is, the main force of 
the moof corrected after reading the next moof and correcting the serial number 
to a new value is carried out, and corresponding mdat is outputted after that. 
After performing these processings, termination judging processing 70 is 
performed again. In addition, when tennnination of conversion outputs the last 
data of (1) contents, a termination demand comes from (2) terminals, and a 
server is independently completed by the time-out of (3) distribution error and 
the response from a terminal etc., ** exists. 

[0009] Drawing 9 is a flow chart for explaining the detail of the moov output 
processing 80 of drawing 8 . In the Moov prime processing, the header 
information of already read moov21 is outputted first. Next, the synchronous 
amendment information for amending the playback time of day of each media is 
outputted. Then, the media access information of a head moof23 is outputted. 



Counting of the byte count is carried out. and all of these output data are 
indicated at the moov head as size of moov70. 

[0010] Drawing 10 is drawing for explaining the contents of the output 
processing 85 of the synchronous amendment information on drawing 9 . 
Drawing 10 shows the synchronous amendment when changing what consisted 
of two media, an audio and video, as a original stream into the stream started 
from the middle. Generally, by the audio and video in one contents, since the 
sampling time of each AU is asynchronous, when a stream is extracted from the 
original stream middle, the playback time of day of the top audio AU and the 
playback time of day of the head video AU are not in agreement. That is. like 
drawing 10 , when making the point near the boundary of video AU 2 and video 
AU 3 into a new start point, as shown in drawing, AU3 to an audio will be started 
from AU8, and time difference T produces video. In synchronous amendment 
processing, it becomes possible by describing the value of this T to a conversion 
stream to make the time relation of an audio and video into the same time 
amount location as a original stream in a terminal at the time of playback. 
Although the information which shows the purport "which T Delays playback 
initiation of a video signal" is outputted in drawing 10 since video is behind, the 
information on a purport "playback initiation of an audio signal is delayed 
predetermined time" when the audio can be sent is outputted. Since all of such 



playback time of day are described by moov or moot of a original stream, it 
performs judgment and count by the size of the value of the time stump 
described by moov or moof of a original stream. 

[0011] In addition, to one fixed [ the playback time of day of AU of an audio 1 for 
example, in a cycle of 30ms, and short, the playback time of day of AU of video 
is as long as 10 frames per second, i.e., 100ms, it is an adjustable frame rate 
further, and, in the case of coding of a low rate, the period becomes irregular in 
many cases. For this reason, in a terminal side, it reproduces by making an 
audio period into a criteria period in many cases. That is, in a terminal side, only 
when regeneration of video is needed on the basis of regeneration of an audio 
signal, the configuration which performs video outlet processing is taken. 
Therefore, the burden by the side of a terminal is [ like 1 mitigable by [ will display 
video on coincidence or the back, namely, "will delay playback initiation of a 
video signal" if the playback of an audio used as criteria is started ] taking a 
playback starting position. On the contrary, if audio playback is delayed, 
regeneration of a video signal will have to be made to start before initiation of 
processing used as criteria, the processing at the time of initiation will become 
the different control approach from the usual processing, and additional software 
or hardware will be needed for a terminal side. As mentioned above, a server 
side can realize the stream started from the middle of the original contents in the 



slight throughput of only fine correction of generation of new nnoov of moof 
according to the example explained by drawing 10 from drawing 6 , and, on the 

other hand, a terminal side can realize playback from the middle the same at all 

-\ 

with the processing which carries out reception playback of the conventional 
storm. 

[0012] When especially the configuration of drawing 8 is combined with a video 
server, it is realizable by [ as the case where contents are distributed from a 

> 

contents head, and the case where it distributes from the contents middle / 
same ] moreover holding only single data, and capacity of the are recording 
equipment of a video server can be made small, or it is effective in the ability to 
hold more data with fixed are recording equipment. 

[00131 Drawing 11 shows the 2nd example of this invention. Drawing 11 Is a 
processing configuration which starts the original stream (image of real time) 
encoded by real time from the point in time of arbitration, and is distributed as a 
new conversion stream. The inputted image 51 Is encoded by the encoding 
processing 52 on real time, and the original stream 53 is generated. In the 
real-time data-conversion processing 101, this original stream 53 Is changed into 
the conversion stream 57 started from the time of there being directions on real 
time, and message distribution processing 58 distributes. This is used for an 
application which accesses the image currently photoed at any time from two or 



more terminals with a surveillance camera etc. like drawing 12 . 

[0014] Drawing 13 is a flow chart explaining the detail of real-time 

data-conversion processing of drawing 1 1 . First, before conversion initiation, the 

header information of a stream 53 is acquired and are recording processing 120 

is performed. Next, when there are not waiting and a distribution demand about 

the distribution demand from a terminal, next moof or next mdat of a stream 53 is 

searched. 

[0015] When there is a distribution demand, the first moof after a demand is 
searched, and moov is generated and outputted from this moof infomiation and 
previous header information. Subsequent processings are the same as drawing 
8 . In addition, the case where the post process of a real-time operation "is not 
completed" in addition to the post process in the case of drawing 8 is included. 
Therefore, it is necessary to indicate the information showing the meanings, 
such as an "indeterminate", "infinity", and "real-time distribution", in the fields 
indicated at the Head moov, such as data size and playback time amount 
(contents length) of contents, and to consider distinction as the case where data 
size like [ in the case of drawing 8 ] is finite. In addition, it is possible to omit the 
correction 83 of moov written data by indicating the information showing the 
meanings, such as an "indeterminate", "infinity", and "real-time distribution", in 
the fields indicated at the Head moov, such as data size and playback time 



amount, also in the case of drawing 8 . 

[0016] Altliough tlie time delay of a original stream and a distribution stream 
becomes large and real time nature is spoiled a little by storing temporarily mdat 
which moof(s) and corresponds in a buffer etc., the time difference of 
video-delivery-through-the-lnternet initiation can be compensated from the 
distribution demand issue from a terminal, and distribution can be carried out 
from the image of a distribution demand point. 

[00171 The following processings are sufficient, although header information is 
acquired and acquisition of header information is set up in the are recording 
processing 120 at the time of encoding initiation. 

(1) Distribute header information periodically by the channel put side by side to 
the inside of a stream 53, or a stream 53. 

(2) From the real-time transfomn-processing section, ask the encoder section 
header information and the channel put side by side to the inside of a stream 53 
or a stream 53 for every inquiry notifies header information to it in the encoder 
section. 

(3) Beforehand, record header Information on the real-time transform-processing 
section, and process with the same parameter as this in the encoder section. 
[0018] Drawing 14 is moov output processing for real-time data. In order to 
secure real time nature, the output of moov to generate is made into a temporary 



buffer, and although the contents of processing are the same as drawing 9 , 
immediately after moov generation is completed, moov data are distributed by 
the data prime processing 125 in a buffer. 

[0019] Drawing 15 is the 3rd example of this invention. In drawing 15 , the 
stream of plurality (the example of drawing 16 three) from which a bit rate differs 
is prepared about the same contents, and bit rate adjustable transmission is 
attained by changing two or more streams in the combination unit of moof/mdat 
according to the demand from a terminal. Drawing 16 is the example applied to a 
system by which the bit rate of the circuit to a terminal is changed as a bit rate 
adjustable example, and it becomes possible to distribute with the bit rate which 
was adapted for the network bit rate. 

[0020] In drawing 16 , with the distribution started in 32kbps at the beginning, the 
change request to midst 48kbps of time of day 4 is in expansion of the bandwidth 
from the time-of-day 4 neighborhood, and the bit rate is changed [ time of day / 
11 ] into 32kbps(es) for the rate from time of day 13 from time of day 5 
modification and henceforth at 64kpbs(es). 

[0021] Drawing 17 is a flow chart for explaining the detail of the real-time 
data-conversion processing corresponding to processing of drawing 15 . 
Although the contents of processing are almost the same as drawing 13 . it 
differs from drawing 13 in that the bit rate modification processing 152 is added 



in front of the bit rate setting processing 150 and each nnoof/nndat distribution at 
the time of distribution initiation in the bit rate change-request existence judging 
151 and with a bit rate change request. Moreover, each moof/mdat is read fronn 
the stream corresponding to the bit rate set up at each time. On the other hand, 
about the stream of the bit rate which does not con-espond above, the 
synchronization is taken for moof/mdat in the skip and the usual state by 
processing 160,161. In addition, although the 3rd example of this invention 
explained based on the 2nd example, it is clear. [ of combination being possible 
for the 1 st example of this invention ] 

[0022] It becomes possible from drawing 15 to distribute with the rate which 
suited the occasional circuit rate by processing of drawing 17 also in the system 
by which the circuit rate to a terminal is changed. Moreover, although there is 
little fluctuation of the circuit rate under distribution, it is effective, also when an 
actual circuit rate is determined by the perimeter environment etc. and 
determined in advance. In addition, measurement of a circuit rate is performed 
using the information on the following environments. 

[0023] (1) Notify from a terminal the receiving bit rate measured at the terminal. 
(2) Notifying the information relevant to a circuit rate from a temninal, a 
distribution side sets up a bit rate suitable based on the received information. For 
example, in an acquisition number of circuit and a radio channel, the 



reinforcement of an electric wave, the value of an error rate, etc. are used by the 
channel which bundles multiple-lines, such as the maximum bit rate. 

(3) A server and a terminal synchronize, operate, namely, when it is the system 
by which the notice of the completion of data reception or the Request to Send of 
degree data is obtained from a terminal side, measure a transmitting bit rate in a 
server. 

(4) When a server and a terminal synchronize and it operates, presume a 
transmitting bit rate from the residue of a transmission buffer. 

(5) A communication link bit rate notice is given from a network. i^,. 

(6) The information relevant to a circuit rate is notified from a network. ; 

(7) The above should put together. 

[00241 Drawing 18 is drawing explaining the outline of the 4th example of thi? 
invention. The 4th example is a modification of the 1st example of drawing 6 . 
and in the 1st example, for the stream which used moof of the forni of drawing 8 , 
although, it is aimed at the stream which does not use moof of drawing 1 by the 
4th example. 

[0025] In the 4th example, since the original stream to input is the fonmat of 
drawing 1 , the contents become like the upper part of drawing 18 . That is. there 
is only one moov and the inside of it consists of header information 13 and 
media access information 14. the data location and time stump for every 



fragmentation data with wliich the media access information 14 consists of short 
time amount logically here - dividing - thinking - **** - things are made. When 
playback start time is specified, moov70 of a conversion stream is generated 
from the media access infomiation 201 corresponding to the time of day, and 
header information 13. Moreover, the corresponding fragmentation data 202 are 
outputted as mdat24. The sequential output of the media access information in 
moovH which corresponds to the fragmentation data in mdat12 and it 
henceforth is carried out. 

[0026] Drawing 19 is a flow chart for explaining processing of the 4th example. 
Although fundamental processing is the same as processing of drawing 8 , in 
drawing 19 , media access Information 201 retrieval processing 210 of initiation 
data is performed instead of moof23 retrieval of drawing 8 . Moreover, to 
outputting a head mdat24 as it is, based on the data location obtained In 
processing 210, initiation data 202 retrieval 211 Is performed and obtained mdat 
is outputted as a head mdat24 by drawing 19 in drawing 8 . 
[0027] Hereafter, in a loop formation 220, the media access information read-out 
212 and data [ degree ] read-out processing 214 in mdat are performed similarly. 
Moreover, in drawing 8 , although only the serial number was corrected and read 
moof was outputted, in the case of drawing 19 , in processing 213, moof Is 
generated and outputted from the corresponding data of moov. 



[0028] Thus, application is possible also to the stream for which this invention is 
not using moot. If it applies to the stream which Is not using moof, the analysis of 
the data In moov will be needed and throughput will be cut In many compared 
with the case where moof is used. On the other hand, although the start point 
was set up only in the unit of moof when moof was being used, in the 4th 
example, it can start from AU of arbitration. However, Initiation AU needs to be 
AU in which random access is possible. Moreover, the stream which makes a 
start point AU in the middle of moof is generable to the stream which used moof 
by applying the 4th example and performing processing which analyzes the 
interior of moov or moof. 

[0029] When it combines with a video server from the 2nd example like the 1st 
example to the 4th example, there is the following effectiveness. In the 2nd 
example, real-time data can be distributed from moof/mdat of arbitration. 
Moreover, since there is little throughput of conversion, it becomes possible to 
distribute to coincidence in limited CPU to more terminals with which distribution 
starting positions differ, respectively. 

[0030] Compared with distributing installing a transformer codec (inverter which 
combined the decoder and the encoder), or preparing incompressible contents 
at the time of the distribution from a video server, and carrying out encoding 
processing to real time for every terminal at it, in order to change a bit rate, the 



distribution corresponding to band fluctuation is realizable in very small 
throughput with the 3rd example. Moreover, since fluctuation of the band for 
every terminal is possible in the same throughput even if it differs, respectively, 
the number of terminals which can be processed in CPU of fixed throughput is 
not changed. [ of processing ] 
[0031] 

[Effect of the Invention] In the 1st example, a server side can realize the stream 
started from the middle of the original contents in the slight throughput of only 
generation of moov, and fine correction of moof, and on the other hand, a 
terminal side can realize playback from the middle, without changing an old 
storm in any way with the processing which carries out reception playback. In the 
2nd example, the stream encoded by real time can be started from the point in 
time of arbitration, processing distributed as a new stream can be realized in 
slight throughput, and, on the other hand, a tenninal side pan be reproduced in 
the image of real time the same at all with the processing which carries out 
reception playback of the old storm. 

[00321 In the 3rd example, it becomes possible to distribute with the rate which 
suited the occasional circuit rate also in the system by which the circuit rate to a 
terminal is changed. Moreover, although there is little fluctuation of the circuit 
rate under distribution, it is effective, also when an actual circuit rate is 



determined by the perimeter environment etc. and determined in advance. 



DESCRIPTION OF DFIAWINGS 

[Brief Description of the Drawings] 

[Drawing 1] Drawing explaining MP4 file format. 

[Drawing 2] Drawing explaining the detail of moov1 1 of MP4 file format. 

[Drawing 3] Drawing explaining the file format which used moof. 

[Drawing 4] Drawing explaining the detail of moov21 when using moof. 

[Drawing 5] Drawing explaining the detail of moof23. 

[Drawing 6] The block diagram of the 1st example of this invention. 

[Drawing 7] Drawing explaining the outline of transform processing of the 1st 

example of this invention. 

[Drawing 8] The flow chart explaining the detail algorithm of the 1st example of 
this invention. 

[Drawing 9] The flow chart explaining the detail of moov output processing of 
drawing 8 . 

[Drawing 10] Drawing explaining the synchronous amendment between media. 
[Drawing 11] Drawing explaining the outline of transform processing of the 2nd 



example of this invention. 

[Drawing 12] Drawing explaining the application of the 2nd example of this 
invention. 

[Drawing 13] The flow chart explaining the detail algorithm of the 2nd example of 
this invention. 

[Drawing 14] The flow chart explaining the detail of moov output processing of 
drawing 13 . 

[Drawing 15] Drawing explaining the outline of transform processing of the 3rd 
example of this invention. 

[Drawing 16] Drawing explaining the outline of actuation of the 3rd example of 
this invention. 

[Drawing 17] The flow chart explaining the detail algorithm of the 3rd example of 
this invention. 

[Drawing 18] Drawing explaining the outline of actuation of the 4th example of 
this invention. 

[Drawing 19] The flow chart explaining the detail algorithm of the 4th example of 
this invention. 
[Description of Notations] 

11 moov 

12 mdat 



53 Original Stream 

56 Transform Processing 

57 It is Playback Stream Middle. 
80 Moov Output Processing 

101 Real-time Data-Conversion Processing 
130 Bit Rate Change Request 
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